Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
1.
J Am Med Inform Assoc ; 30(6): 1022-1031, 2023 05 19.
Article in English | MEDLINE | ID: covidwho-2265425

ABSTRACT

OBJECTIVE: To develop a computable representation for medical evidence and to contribute a gold standard dataset of annotated randomized controlled trial (RCT) abstracts, along with a natural language processing (NLP) pipeline for transforming free-text RCT evidence in PubMed into the structured representation. MATERIALS AND METHODS: Our representation, EvidenceMap, consists of 3 levels of abstraction: Medical Evidence Entity, Proposition and Map, to represent the hierarchical structure of medical evidence composition. Randomly selected RCT abstracts were annotated following EvidenceMap based on the consensus of 2 independent annotators to train an NLP pipeline. Via a user study, we measured how the EvidenceMap improved evidence comprehension and analyzed its representative capacity by comparing the evidence annotation with EvidenceMap representation and without following any specific guidelines. RESULTS: Two corpora including 229 disease-agnostic and 80 COVID-19 RCT abstracts were annotated, yielding 12 725 entities and 1602 propositions. EvidenceMap saves users 51.9% of the time compared to reading raw-text abstracts. Most evidence elements identified during the freeform annotation were successfully represented by EvidenceMap, and users gave the enrollment, study design, and study Results sections mean 5-scale Likert ratings of 4.85, 4.70, and 4.20, respectively. The end-to-end evaluations of the pipeline show that the evidence proposition formulation achieves F1 scores of 0.84 and 0.86 in the adjusted random index score. CONCLUSIONS: EvidenceMap extends the participant, intervention, comparator, and outcome framework into 3 levels of abstraction for transforming free-text evidence from the clinical literature into a computable structure. It can be used as an interoperable format for better evidence retrieval and synthesis and an interpretable representation to efficiently comprehend RCT findings.


Subject(s)
COVID-19 , Comprehension , Humans , Natural Language Processing , PubMed
2.
Information Processing and Management ; 60(3), 2023.
Article in English | Scopus | ID: covidwho-2233026

ABSTRACT

The paper presents new annotated corpora for performing stance detection on Spanish Twitter data, most notably Health-related tweets. The objectives of this research are threefold: (1) to develop a manually annotated benchmark corpus for emotion recognition taking into account different variants of Spanish in social posts;(2) to evaluate the efficiency of semi-supervised models for extending such corpus with unlabelled posts;and (3) to describe such short text corpora via specialised topic modelling. A corpus of 2,801 tweets about COVID-19 vaccination was annotated by three native speakers to be in favour (904), against (674) or neither (1,223) with a 0.725 Fleiss' kappa score. Results show that the self-training method with SVM base estimator can alleviate annotation work while ensuring high model performance. The self-training model outperformed the other approaches and produced a corpus of 11,204 tweets with a macro averaged f1 score of 0.94. The combination of sentence-level deep learning embeddings and density-based clustering was applied to explore the contents of both corpora. Topic quality was measured in terms of the trustworthiness and the validation index. © 2023 The Author(s)

3.
Information Processing & Management ; : 103294, 2023.
Article in English | ScienceDirect | ID: covidwho-2210541

ABSTRACT

The paper presents new annotated corpora for performing stance detection on Spanish Twitter data, most notably Health-related tweets. The objectives of this research are threefold: (1) to develop a manually annotated benchmark corpus for emotion recognition taking into account different variants of Spanish in social posts;(2) to evaluate the efficiency of semi-supervised models for extending such corpus with unlabelled posts;and (3) to describe such short text corpora via specialised topic modelling. A corpus of 2,801 tweets about COVID-19 vaccination was annotated by three native speakers to be in favour (904), against (674) or neither (1,223) with a 0.725 Fleiss' kappa score. Results show that the self-training method with SVM base estimator can alleviate annotation work while ensuring high model performance. The self-training model outperformed the other approaches and produced a corpus of 11,204 tweets with a macro averaged f1 score of 0.94. The combination of sentence-level deep learning embeddings and density-based clustering was applied to explore the contents of both corpora. Topic quality was measured in terms of the trustworthiness and the validation index.

4.
2022 Workshop on Creating, Enriching and Using Parliamentary Corpora, ParlaCLARIN III 2022 ; : 117-124, 2022.
Article in English | Scopus | ID: covidwho-2167388

ABSTRACT

This paper describes the process of acquisition, cleaning, interpretation, coding and linguistic annotation of a collection of parliamentary debates from the Senate of the Italian Republic covering the COVID-19 pandemic emergency period and a former period for reference and comparison according to the CLARIN ParlaMint prescriptions. The corpus contains 1199 sessions and 79,373 speeches for a total of about 31 million words, and was encoded according to the ParlaCLARIN TEI XML format. It includes extensive metadata about the speakers, sessions, political parties and parliamentary groups. As required by the ParlaMint initiative, the corpus was also linguistically annotated for sentences, tokens, POS tags, lemmas and dependency syntax according to the universal dependencies guidelines. Named entity annotation and classification is also included. All linguistic annotation was performed automatically using state-of-the-art NLP technology with no manual revision. The Italian dataset is freely available as part of the larger ParlaMint 2.1 corpus deposited and archived in CLARIN repository together with all other national corpora. It is also available for direct analysis and inspection via various CLARIN services and has already been used both for research and educational purposes. © European Language Resources Association (ELRA).

5.
Journal of Pragmatics ; 191:256-270, 2022.
Article in English | ScienceDirect | ID: covidwho-1706577

ABSTRACT

This article presents a novel framework for examining how emotional labor is performed linguistically. Bringing together Arlie Hochschild's pioneering sociological work and insights from the linguistic literature on emotion, the framework aims to capture the discursive mechanisms through which workers express, background and manage emotions in fulfilling their professional roles. We demonstrate the framework through a case study of a corpus of Twitter interactions involving passengers and airline customer service agents during the first wave of the Covid-19 pandemic. Following recent calls for triangulation in corpus linguistics, we explore the corpus using three complementary methods: lexical, move and dialogic analysis. From a theoretical perspective, this study contributes to improving our understanding of the pervasive phenomenon of emotional labor. From an applied perspective, it offers a new approach for assessing communication practices in various professional contexts.

SELECTION OF CITATIONS
SEARCH DETAIL